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We describe two nonconventional algorithms for linear regression, called GAME 
and CLASH. The salient characteristics of these approaches is that they exploit the 
convex ii-ball and non-convex io-sparsity constraints jointly in sparse recovery. To 
establish the theoretical approximation guarantees of GAME and CLASH, we cover 
an interesting range of topics from game theory, convex and combinatorial optimiza¬ 
tion. We illustrate that these approaches lead to improved theoretical guarantees and 
empirical performance beyond convex and non-convex solvers alone. 


1.1 Introduction 


Sparse approximation is a fundamental problem in compressed sensing mm , as 
well as in many other signal processing and machine learning applications including 
variable selection in regression EllllS], graphical model selection El [7], and sparse 
principal component analysis In sparse approximation, one is provided with 


*. Authors are in alphabetical order. 
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a dimensionality reducing measurement matrix $ G (M < N), and a low 

dimensional vector f G such that: 


/ = ^ct* + n, 


( 1 . 1 ) 


where a* G M." is the high-dimensional signal of interest and n G is a potential 
additive noise term with |ln ||2 < a. 

In this work, we assume a* is a fc-sparse signal or is sufficiently approximated 
by a fc-sparse vector. The goal of sparse approximation algorithms is then to find 
a sparse vector a G such that — / is small in an appropriate norm. In 
this setting, the £o“iiiffiimization problem emerges naturally as a suitable solver to 


recover a* in (1.1): 


minimize 

aeR™ 


|q:||o subject to \\f — ^a\\ 2 <(J, 
where ||q:||o counts the nonzero elements (the sparsity) of a. 


( 1 . 2 ) 


Unfortunately, solving (1.2) is a challenging task with exponential time complex¬ 


ity. Representing the set of all fc-sparse vectors as: 

A,„(fc) = {aeK'^:||a||o<fc}, 


(1.3) 


hard thresholding algorithms uni HU na [13 m abandon this approach in favor 
of greedy selection where a putative fc-sparse solution is iteratively refined using 
local decision rules. To this end, hard thresholding methods consider the following 


£o-constrained least squares problem formulation as an alternative to (1.2): 


minimize 11/ — subject to a G Ag (k). 

qGR™ 


(1.4) 


These methods feature computational advantages and also are backed up with a 
great deal of theory for estimation guarantees. 

In contrast, convex optimization approaches change the problem formulations 
above by “convexifying” the combinatorial .^g-constraint with the sparsity inducing 
convex £i-normj^ As a result, (1.2) is transformed into the £i-minimization, also 
known as the Basis Pursuit (BP) problem [15] : 


minimize 

aeR™ 


ah 


(1.5) 


subject to 11/ — ^Q -\\2 < a. 

Similarly, the famous Lasso algorithm [T^ can be considered as a relaxation of (|1.4[): 


minimize 11/ — subject to a G Ag.M, 

aeR™ 


( 1 . 6 ) 


where (t) is the set of all vectors inside the hyper-diamond of radius t: 

= {a G Mla||i < t}- ( 1 - 7 ) 


1. Note that this is not a true convexification, since the ^o-ball does not have a scale. 
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Figure 1.1: Geometric interpretation of the selection process for a simple 
test case f = where ||q:* ||o = 1. 


While both convex and non-convex problem formulations can find the true prob¬ 
lem solution under various theoretical assumptions, one can easily find examples in 
practice where either one can fail. Borrowing from HZ!, we provide an illustrative 
example in for the noiseless case in Fig. 1.1. In (1.2), combinatorial-based ap¬ 
proaches can identify the admissible set of 1-sparse solutions. If a greedy selection 
rule is used to arbitrate these solutions, then such an approach could pick (A). In 
contrast, the BP algorithm selects a solution (B), and misses the candidate solution 
(A) as it cannot exploit prior knowledge concerning the discrete structure of cx*. 

To motivate our discussion in this book chapter, let us assume that we have 
the true model parameters || q;*||o = k and || q;*||i = r. Let us then consider 
geometrically the—unfortunate but common—case where the kernel of ker($), 
intersects with the tangent cone T||„|p<,-(q;*) = {s(y — a*) : ||y||i < r and s > 0} 
at the true vector ex* (cf., (E) in Fig. 1.1(b)). From the Lasso perspective, we are 
stuck with the large continuum of solutions based on the geometry, as described by 
the set 1 = ker($) n Tj|Q,|p<T-(Q;*), as illustrated in Figure 1.1(b) within the box. 

Without further information about the discrete nature of a*, a convex optimiza¬ 
tion algorithm solving the Lasso problem can arbitrarily select a vector from 3. By 
forcing basic solutions in optimization, we can reduce the size of the solution space 
to £ = ln{||Q;||i = 1}, which is constituted by the sparse vectors (C) and (E). Note 
that L might be still large in high dimensions. However, in this scenario, adding the 
A(g(k) constraints, we can make precise selections (e.g., exactly 1-sparse), signifi¬ 
cantly reduce the candidate solution set, and, in many cases, can obtain the correct 
solution (E) if we leverage the norm constraint. 

Contents of this book chapter: Within this context, we describe two efficient, 
sparse approximation algorithms, called GAME and Clash, that operate over 
sparsity and fi-norm constraints. They address the following nonconvex problem: 


minimize 

{k,T) 

where is the set of all /c-sparse vectors in A(^{t): 

Ai^^e^ik.r) = {a G : ||q;||o < k and ||a||i < r}. 


( 1 . 8 ) 


(1.9) 
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To introduce the Game-theoretic Approximate Matching Estimator (GAME) 


method, we reformulate (1.8) as a zero-sum game. GAME then efficiently obtains 


a sparse approximation for the optimal game solution. GAME employs a primal- 
dual scheme, and require 0{k) iterations in order to find a fc-sparse vector with 
O additive approximation error. 

To introduce the Combinatorial selection and Least Absolute SHrinkage operator 
Clash, we recall hard thresholding methods and explain how to incorporate the 
norm constraint. A key feature of the Clash approach is that it allows us 
to exploit ideas from the model-based compressive sensing (model-CS) approach, 
where selections can be driven by a structured sparsity model |18L I19j . 

We emphasize again that since is not convex, the optimization 


problem (1.8) is not a convex optimization problem. However, we can still derive 


theoretical approximation guarantees of both algorithms. For instance, we can prove 
that for every dimension reducing matrix $, and every measurement vector /, 
GAME can find a vector a G A£g^£^(k,T) with 








^[vk 


( 1 . 10 ) 


where g is a positive integer. This sparse approximation framework surprisingly 
works for any matrix 4>. Compared to the GAME algorithm. Clash requires 
stronger assumptions on the measurement matrix for estimation guarantees. How¬ 
ever, these assumptions, in the end, lead to improved empirical performance. 


1.2 Preliminaries 

Here, we cover basic mathematical background that is used in establishing algo¬ 
rithmic guarantees in the sequel. 

1.2.1 Bregman Projections 

Bregman divergences or Bregman distances are an important family of distances 
that all share similar properties [20l [2l] . 

Definition 1.1 (BregmanDistance). Let'Ll : § —?> M 6e a continuously-differentiable 
real-valued and strictly convex function defined on a closed convex set §. The 
Bregman distance associated with IR for points P and Q is: 

■ByiiP, Q) = 3^(P) - 3?(Q) - ((P - Q), V3l(Q)). 

Table o summarizes examples of the most widely used Bregman functions and 
the corresponding Bregman distances. 

The Bregman distance has several important properties that we will use later in 
analyzing our sparse approximation algorithm. 





1.2 Preliminaries 


5 



Figure 1.2: The Bregman divergence associated with a continuously- 
differentiable real-valued and strictly convex function IR is the vertical dis¬ 
tance at P between the graph of 31 and the line tangent to the graph of 3? in 

Q 


Table 1.1: Summary of the most popular Bregman functions and their 
corresponding Bregman distances. Here $ is a positive semidefinite matrix. 


Name 

Bregman 
Function (3?(P)) 

Bregman 

Distance (^^(PjQ)) 

Squared 

Euclidean 

l|P|li 

l|P-Qlli 

Squared 

Mahalanobis 

(p,$p) 

{(P-Q),#(P-Q)) 

Entropy 

E.P>iogPi-P. 

El P* log ^ “ Ei(P» “ Q») 

Itakura-Saito 

Ei-iogP* 



Theorem 1.2. Bregman distance satisfies the following properties: 

■ (PI). IBk(P)Q) > 0, and the equality holds if and only z/P = Q. 

■ (P2). For every fixed Q if we define S(P) = 'Bgj(P,Q), then 

Vg(P) = VIR(P) - V3?(Q). 

■ (PS). Three point property: For every P, Q and T in § 

®3,(P, Q) = ®3j(P, T) + ■B3,(T, Q) + ((P - T), V3?(Q) - V3i(T)). 
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■ (P4)- For every P, Q S §, 

a33,(P, Q) + ®s(Q, P) = ((P - Q), (V3l(P) - V3?(Q))). 

Proof. All four properties follow directly from Definition o □ 

Now that we are equipped with the properties of Bregman distances, we are ready 
to define Bregman projections of points into convex sets. 

Definition 1.3 (Bregman Projection). Let IR : § —>■ K fee a continuously- 
differentiable real-valued and strictly convex function defined on a closed convex 
set §. Let n be a closed subset of §. Then, for every point Q in §, the Bregman 
projection of Q into LI, denoted as IPn(Q) is 

J’a(Q) = argmm®K(P,Q)- 

Bregman projections satisfy a generalized Pythagorean Theorem. 

Theorem 1.4 (Generalized Pythagorean Theorem [10]). Let IR : § —)■ K fee a 
continuously-differentiable real-valued and strictly convex function defined on a 
closed convex set §. Let Ll be a closed subset of §. Then for every P S and 
QeS 

133i(P,Q) > ®K(P,Ta(Q)) + a33j(Tn(Q),Q), (l-ll) 

and in particular 

®3i(P,Q) >®K(P,Ta(Q)). (1.12) 

We refer the reader to m, or [HI for a proof of this theorem and further 
discussions. 

1.2.2 Euclidean Projections onto the Iq and the £i-ball 

Here, we describe two of key actors in sparse approximation. 

Projections onto combinatorial sets: The Euclidean projection of a signal 
w G on the subspace defined by (k) is provided by: 

J’Af^(fc)(w)= argmin ||a-w|| 2 , (1.13) 

CKlQie (k) 

whose solution is hard thresholding. That is, we sort the coefficients of w in 
decreasing magnitude and keep the top k and threshold the rest away. This 
operation can be done in 0{nlogn) time complexity via simple sorting routines. 

Projections onto convex norms: Given w G the Euclidean projection 
onto a convex .^i-norm ball of radius at most r defines the optimization problem: 

^Af,^(r)(w)= argmin ||Q;-w|j 2 , 

ckickG (r) 


(1.14) 


1.3 
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whose solution is soft thresholding. That is, we decrease the magnitude of all the 
coefficients by a constant value just enough to meet the norm constraint. A 
solution can be obtained in 0{n log n) time complexity with simple sorting routines, 
similar to above. 

1.2.3 Restricted Isometry Property 

In order to establish stronger theoretical guarantees for the algorithms, it is nec¬ 
essary to use Restricted Isometry Property (RIP) assumption. For each positive 
integers q and k, and each e in (0,1), a,n M x N matrix €> satishes the (fc, e) RIP 
in iq norm ((A:,e) RIP-q) [23l[2l], if for every fc-sparse vector a, 

(l-e)||a|l5 < ll^all, < (1-f e)|la||5. 

This assumption implies near isometric embedding of the sparse vectors by the 
matrix €>. We just briefly mention that such matrices can be constructed randomly 
using certain classes of distributions [23] . 


1.3 The GAME Algorithm 

1.3.1 A Game Theoretic Reformulation of Sparse Approximation 


We start by defining a zero-sum game and then proving that the sparse approxi¬ 
mation problem of Equation (1.8 1 can be reformulated as a zero-sum game. 


Definition 1.5 (Zero-sum games |2S|). Let A and 23 he two closed sets. Let 
C : A X T) ^ M. be a function. The value of a zero sum game, with domains A 
and 23 with respect to a function L is defined as 


minmax£(a, b). (1.15) 

aeA be® 

The function L is usually called the loss function. A zero-sum game can be viewed 
as a game between two players Mindy and Max in the following way. First, Mindy 
finds a vector a, and then Max finds a vector 6. The loss that Mindy suffer^ is 
£(a, b). The game-value of a zero-sum game is then the loss that Mindy suffers if 
both Mindy and Max play with their optimal strategies. 

Von Neumann’s well-known Minimax Theorem HIIIT] states that if both A and 
23 are convex compact sets, and if the loss function L{a,b) is convex with respect 
to a, and concave with respect to b, then the game-value is independent of the 
ordering of the game players. 

Theorem 1.6 (Von Neumann’s Minimax Theorem [IH])- Let A and 23 he closed 


2. which is equal to the gain that Max obtains as the game is zero-sum. 
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convex sets, and let L : .A x 23 —>■ K be a function which is convex with respect to 
its first argument, and concave with respect to its second argument. Then 

inf sup £ (a, 6) = sup inf L{a,b). 

For the history of the Minimax Theorem see [28]. The Minimax Theorem tells 
us that for a large class of functions L, the values of the min-max game in which 
Mindy goes first is identical to the value of the max-min game in which Max starts 
the game. The proof of the Minimax Theorem is provided in |29j . 

Having defined a zero-sum game, and the Von Neumann Minimax Theorem, 


we next show how the sparse approximation problem of Equation (1.8) can be 
reformulated as a zero-sum game. Let p = and define 


= {Pe 


pM . 


<!}■ 


Define the loss function L : Ep x (t) 
£(P,a) = (P,($a-/)). 


as 


(1.16) 


(1.17) 


Observe that the loss-function is bilinear. Now it follows from Holder inequality 
that for every a in (fc, t), and for every P in Ep 


£(P,a) = (P, (#a - /)) < ||P||p||$a - /||, < ||#a - /|| 




(1.18) 


The inequality of Equation (1.18) becomes equality for 

($a-/)fP 


P* = 




i/p ■ 


Therefore 


max A(P,q:) = migc(P, ($q: - /)) = (P*, ($a - /)) = |i$a - /||,. (1.19) 


Equation (1.19) is true for every ct € Afg(r). As a result, by taking the minimum 


over A 4 ^^^(fc,T) we get 


min min max£(P,Q!). 

oiGAtg,ti(k,T) aGAfg,f^(fc,T) PeHp 


Similarly by taking the minimum over A^^ (t) we get 
min ll^a —/|L= min maxA(P,Q;). 

aeAegO ctGAf j (r) PGHp 


( 1 . 20 ) 


Solving the sparse approximation problem of Equation (1.8) is therefore equivalent 
to finding the optimal strategies of the game 


min max£(P,Q:). 

aGAfg,f (fe,r) PSHp 


( 1 . 21 ) 
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In the next section we provide a primal-dual algorithm that approximately solves 
this min-max game. Observe that since (fc, r) is a subset of (r), we always 

have 


min max£(P,a)< min max£(P,Q;), 

tGAfg (t) PGHp aGAfg^^j (fc,T) PGHp 


and therefore, in order to approximately solve the game of Equation (1.21), it is 
sufficient to find a G A^g with 


max£(P,Q:)« min max£(P,Q:). 

PGHp aGAfg(r)PGHp 


1.3.2 Algorithm Description 


( 1 . 22 ) 


In this section we provide an efficient algorithm for approximately solving the 
problem of sparse approximation in iq norm, defined by Equation (1.10). Let 


£(P,a) be the loss function defined by Equation (1.17), and recall that in order 
to approximately solve Equation (1.10), it is sufficient to find a sparse vector 
a G such that 


max£(P,Q;)« min max£(P,Q:). 

PGHp (r) 


(1.23) 


The original sparse approximation problem of Equation (1.10) is NP-complete, 


but it is computationally feasible to compute the value of the min-max game 


min max£(P,Q:). 

*'GAig(T)PGHp 


(1.24) 


The reason is that the loss function £(P, a) of Equation (1.17) is a bilinear function, 
and the sets Afg(r), and Sp are both convex and closed. 

Therefore, finding the game values and optimal strategies of the game of Equa¬ 


tion (1.24) is equivalent to solving a convex optimization problem and can be done 


using off-the-shelf non-smooth convex optimization methods [sniisi]. However, if 
an off-the-shelf convex optimization method is used, then there is no guarantee 
that the recovered strategy & is also sparse. We need an approximation algorithm 
that finds near-optimal strategies a and P for Mindy and Max with the additional 
guarantee that Mindy’s near optimal strategy a is sparse. 

Here we introduce the Game-theoretic Approximate Matching Estimator (GAME) 
algorithm which finds a sparse approximation to the min-max optimal solution 


of the game defined in Equation (1.24). The GAME algorithm relies on the 


general primal-dual approach which was originally applied to developing strategies 
for repeated games [2^] (see also [32] and [33]). The pseudocode of the GAME 
Algorithm is provided in Algorithm o 

The GAME Algorithm can be viewed as a repeated game between two players 
Mindy and Max who iteratively update their current strategies P* and a*, with the 
aim of ultimately finding near-optimal strategies based on a T-round interaction 
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Algorithm 1.1 GAME Algorithm for Sparse Approximation in £g-norm. 

Inputs: M-dimensional vector f, M x N matrix number of iterations T, sparse 
approximation norm q, Bregman function Jl and regularization parameter rj. 

Output: A-dimensional vector d 


with each other. Here, we briefly explain how each player updates his/her current 
strategy based on the new update from the other player. 

Recall that the ultimate goal is to find the solution of the game 


min max£(P,Q;). 

ct'eAf (r) PGHp 


At the begining of each iteration t, Mindy receives the updated value P* from Max. 
A greedy Mindy only focuses on Max’s current strategy, and updates her current 
strategy to a* = argminQ,g^j^(T-) £(P*,q;). In the following lemma we show that 
this is indeed what our Mindy does in the hrst three steps of the main loop. 


Lemma 1.7. Let P* denote Max’s strategy at the begining of iteration t. Let 
= $^P*, and let i denote the index of a largest (in magnitude) element of 
r*. Let od be a 1-sparse vector with Supp(Q;‘) = {f} and with a* = —rSign(r*). 
Then a* = argmin„gA,Jr)«)• 


Proof. Let a be any solution a = argmin^gAf (r) '^(P*: o;). It follows from the 
bilinearity of the loss function (Equation (1.17)) that 


q; = arg min £(P*,Q!) 

aGAfj (r) 

= arg min (P*, $q; —/) = arg min ($^P*,q;). 

aGAfJr) aGA^Jr) 


Hence, Holder inequality yields that for every ol^ G Afj(r), 


($^P‘,a#) > -||a#||i|l$^P*|U > -r||$^P‘|U. 


(1.25) 


Now let a* be a 1-sparse vector with Supp(a*) = {i} and q:( = —r Sign (r-). Then 
a* S A^^(r), and 

In other words, for a* the Holder inequality is an equality. Hence a* is a minimizer 
of($^P*,a). □ 

Thus far we have seen that at each iteration Mindy always finds a 1-sparse 
solution a* = argmin^gA^ (r)'C(P*j ck)- Mindy then sends her updated strategy 
a* to Max, and now it is Max’s turn to update his strategy. A greedy Max would 
prefer to update his strategy as P*+^ = argmaxpgSp 'C(P, a*). However, our Max 
is more conservative and prefers to stay close to his previous value P*. In other 
words, Max has two competing objectives 
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1. Maximizing £(P,Q!*), or equivalently minimizing —£(P,q;*). 

2. Remaining close to the previous strategy P*, by minimizing !B 3 j(P, P*“^). 

Let 


£3^(P) = -7?£(P,a‘)+®K(P,P*), 

be a regularized loss function which is a linear combination of the two objectives 
above. 

A conservative Max then tries to minimize a combination of the two objectives 
above by minimizing the regularized loss function 


P*“''^ = arg min £gj(P) = arg min —rjOV, a*) + !B 3 j(P, P*). 


(1.26) 


Unfortunately, it is not so easy to efficiently solve the optimization problem of 


Equation (1.26) at every iteration. To overcome this difficulty, our Max first ignores 


the constraint P*+^ g Sp, and instead finds a global optimizer of £k(P) by setting 
VXigj(P) = Om) and then projects back the result to Sp via a Bregman projection. 

More precisely, it follows from the Property (P2) of Bregman distance (Theo¬ 
rem 


1.2) that for every P 


V£3^(P) = -77($a‘ - /) + V3?(P) - V3J(P*), 


and therefore if Q* is a point with 


V3^(Q‘) = VD?(P*-i) + 7?( W - /), 


then V£ 3 j(Q*) = 0 


M- 


The vector Q* is finally projected back to via a Bregman projection to ensure 
that Max’s new strategy is in the feasible set Sp. 

1.3.3 The GAME Guarantees 


In this section we prove that the GAME algorithm finds a near-optimal solution for 


the sparse approximation problem of Equation (1.10). The analysis of the GAME 


algorithm relies heavily on the analysis of the generic primal-dual approach. This 
approach originates from the link-function methodology in computational optimiza¬ 
tion |331 [34], and is related to the mirror descent approach in the optimization 
community |35[ 136] . The primal-dual Bregman optimization approach is widely 
used in online optimization applications including portfolio selection |371138] , on¬ 
line learning [53|, and boosting gOillll. 

However, there is a major difference between the sparse approximation problem 
and the problem of online convex optimization. In the sparse approximation prob¬ 
lem, the set A = (k, t) is not convex anymore; therefore, there is no guarantee 

that an online convex optimization algorithm outputs a sparse strategy &. Hence, it 
is not possible to directly translate the bounds from the online convex optimization 
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scheme to the sparse approximation scheme. 

Moreover, as discussed in Lemma |1.7[ there is also a major difference between 
the Mindy players of the GAME algorithm and the general Mindy of general online 
convex optimization games. In the GAME algorithm, Mindy is not a blackbox 
adversary that responds with an update to her strategy based on Max’s update. 
Here, Mindy always performs a greedy update and finds the best strategy as 
a response to Max’s update. Moreover, our Mindy always finds a 1-sparse new 
strategy. That is, she looks among all best responses to Max’s update, and finds a 
1-sparse strategy among them. 

As we will see next, the combination of cooperativeness by Mindy, and standard 
ideas for bounding the regret in online convex optimization schemes, enables us 
to analyze the GAME algorithm for sparse approximation. The following lemma 
bounds the regret loss of the primal-dual strategy in online convex optimization 
problems and is proved in [32]. 

Theorem 1.8. Let q and T be positive integers, and let p = Suppose that IR 
is such that for every P, Q G "ByiiP, Q) > ||P — Q|jp, and let 

G= max \\^a-f\\g. (1.27) 

Also assume that for every P G 5p, we have 233j(P,P^) < D^. Suppose 

((P\ai),..- ,(P^,a^)) 


is the sequence of pairs generated by the GAME Algorithm after T iterations with 


1 

max — 
PeHp T 


T 


^£(P,a‘) 



^£(P‘,a‘) 

i=l 


DG 

2Vt' 


Proof. The proof of Theorem |1.8| is based on the geometric properties of the 
Bregman functions, and is provided in |52|- D 

Next we use Theorem |1.8| to show that the GAME algorithm after T iterations 
finds a T-sparse vector a with near-optimal value ||$Q! — f\\q. 


Theorem 1.9. Let q and T be positive integers, and let p = Suppose that for 
every P, Q G 5p, the function IR satisfies TiyiiP, Q) > ||P — Q||p, and let 


G= max \\^a-f\\g. 

Also assume that for every P G 5p, we have 233j(P,P^) < D^. Suppose 


(1.28) 


((P\ai),..- ,(P^,a^)) 


is the sequence of pairs generated by the GAME Algorithm after T iterations with 
r] = Let a = ^ ^^6 output of the GAME algorithm. Then a. is a 
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T-sparse vector with ||q:||i < r and 


|^Q!-/||,< min \\^a-f\\g 


DG 

2Vt' 


(1.29) 


Proof. It follows from Step 2. of Algorithm HI] that every a* is 1-sparse and 
||q:*||i = T. Therefore, a = have at most T non-zero entries and 

moreover ||q;||i < ^ ^ r. Th erefore a is in A£g^i^{T,T). 

Next we show that the Equation 
that 


1.29 


holds for a. Let P = ^ St=i P*- Observe 


ie) 


min max£(P,Q:) = max min £(P,q;) 

*GAjJr)PGHp PGHpaGAfj(r) 

if) 

> min L 

aGA^j (t) 


(s) 1 

> — min 
T aGAfj^(r) 


(P,a) 

^£(P‘,a) 


min/(P‘,a)=*I^£(P*,a‘) 


t=l 


U) / 1 V- 

> max £ I P, — 


PGS 


T 


a - 


DG 

2Vt' 


Equality (e) is the minimax Theorem (Theorem 1.6). Inequality (f) follows from 
the definition of the max function. Inequalities (g) and (h) are consequences of 
the bilinearity of L and concavity of the min function. Equality (i) is valid by the 
definition of a*, and Inequality (j) follows from Theorem 1.8 As a result 


= max£(P,Q:)< min max£(P,Q!) 


PG 

DG 


= mm 


2a/T aGA^j(T) 


ctGAfj (t) PGHp 

l^a-/IU + 


DG 

2Vt' 


□ 


Remark 1.10. In general, different choices for the Bregman function may lead to 
different convergence bounds with different running times to perform the new pro¬ 
jections and updates. For instance, a multiplicative update version of the algorithm 
can he derived by using the Bregman divergence based on the Kullback-Leibler func¬ 
tion, and an additive update version of the algorithm can be derived by using the 
Bregman divergence based on the squared Euclidean function. 

Theorem |1.9| is applicable to any sensing matrix. Nevertheless, it does not 
guarantee that the estimate vector & is close enough to the target vector a*. 
However, if the sensing matrix satisfies the RIP-g property, then it is possible to 
bound the data-domain error ||q; — a*!!, as well. 
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Theorem 1.11. Let q k, and T he positive integers, let e be a number in (0,1), 
and let p = Suppose that for every P, Q € Sp, the function H satisfies 

®3?(p,Q) > ||P — Qllp, and let $ he an M x N sensing matrix satisfying the 
(k + T, e) RIP-q property. Let a* he a k-sparse vector with ||q;*||i < t, let bm be 
an arbitrary noise vector in and set f — + bm- Let G, D, and rj be as of 

Theorem \l.S\ and let let a be the output of the GAME algorithm after T iterations. 
Then & is a T-sparse vector with ||a||i < t and 


a — ct 




Sljejvfll, + 


(1-e) 


(1.30) 


Proof. Since & is T-sparse and a* is fc-sparse, a — a* is (T + fc)-sparse. Therefore, 
it follows from the RlP-g property of the sensing matrix that 


(1 _ e)||^ _ a*||, < \\^{& - a*)||, < \\^& - f\\, + WbmIU (1-31) 

DG DG 

< - /II, + ^ + lle^ll, = 2||e^||, + 

□ 


1.4 The CLASH Algorithm 


1.4.1 Hard Thresholding Formulations of Sparse Approximation 


As already stated, solving (1.2) is NP-hard and exhaustive search over (^) possible 


support set configurations of the fc-sparse solution is mandatory. Gontrary to this 
brute-force approach, hard thresholding algorithms [ini El m m HI navigate 
through the low-dimensional fc-sparse subspaces, pursuing an appropriate support 
set such to minimize the data error in (1.4). To achieve this, these approaches 


apply greedy support set selection rules to iteratively compute and refine a putative 
solution cti using only first-order information V/(Q:i_i) at each iteration i. 

Subspace Pursuit (SP) El algorithm is a combinatorial greedy algorithm that 
borrows both from Orthogonal Matching Pursuit (OMP) and Iterative Hard 
Thresholding [T3] (IHT) methods. A sketch of the algorithm is given in Algorithm 
2. The basic idea behind SP consists in looking for a good support set by iteratively 
collecting an extended candidate support set Ai with \Ai\ < 2k (Step 4) and then 
finding the fc-sparse vector cti+i that best fits the measurements within the re¬ 
stricted support set Ai, i.e., the support set CKi+i satishes Ai+i = supp(ai+i) C Ai 
(Steps 5-6). 


Algorithm 1.2 Subspace Pursuit Algorithm 

Input: /, fe, Maxiter. Output: a argmin„,„^pp(„)cyii 11/ “ ^v||i 
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In [32], Foucart improves the initial RIP conditions of SP algorithm, which we 
present here as a corollary: 

Corollary 1.12 (SP Iteration Invariant). SP algorithm satisfies the following 
recursive formula: 

lla^+i - q :*||2 < p\\ai - a *||2 + c||n|| 2 , (1.32) 

where c = _|_ .^3(1 + S 2 k) and p < 1 given that 

63 k < 0.38427. 

1.4.2 Algorithm Description 

In this section, we expose Clash algorithm, a Subspace Pursuit m variant, as a 
running example for our subsequent developments. We underline that norm con¬ 
straints can be also incorporated into alternative state-of-the-art hard thresholding 
frameworks nnuniinKHiii]. 


Algorithm 1.3 The Clash Algorithm 

Input: /, Aip,^j(fc,T), Tolerance, Maxiterations Output: exi. 


The Clash algorithm approximates a.* according to the optimization formulation 


(1.8) where q = 2. We provide a pseudo-code of an example implementation of 


Clash in Algorithm |1.3| To complete the z-th iteration, Clash initially identifies 
a 2k extended support set Ai to explore via the Active set expansion step (Step 
1)—the set Ai is constituted by the union of the support Ai of the current solution 
OLi and an additional fc-sparse support where the projected gradient onto Afg(fc) 
can make most impact on the loading vector, complementary to Ai. Given Ai, the 
Greedy descent with least absolute shrinakge step (Step 2) solves a least-squares 
problem over ii-norm constraint to decrease the data error /(ct), restricted over 
the active support set Ai. In sequence, we project the 2fc-sparse solution of Step 
2 onto (k) to arbitrate the active support set via the Combinatorial selection 
step (Step 3). Finally, Clash de-biases the result on the putative solution support 
using the De-bias step (Step 4). 


1.4.3 The CLASH Guarantees 


Clash iterations satisfy the following worst-case guarantee: 

Theorem 1.13. [Iteration invariant] Let ol* he the true solution. Then, the i-th 
iterate oti of Clash satisfies the following recursion 
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||ai+i - q ;*||2 </cIlcKi - a *||2 + ci(( 52 fc,( 53 fc)||n|| 2 , where (1.33) 

( 1 . 34 ) 


and p = ^ 3 ^+ 62 k / . Moreover, when Ssk < 0.3658, the iterations are contrac- 

Vl“^2fc V 
tive (i.e., p < 1 ). 


A detailed proof of Theorem 1.13 can be found in [19]. Theorem 1.13 shows that 


the isometry requirements of Clash are competitive with those of mainstream hard 
thresholding methods, such as SP, even though Clash incorporates the .^i-norm 
constraints—furthermore, we observe improved signal reconstruction performance 
compared to these methods, as shown in the Experiments section. 


1.5 Experiments 

In this section, we provide experimental results to demonstrate the performances 
of the GAME and Clash Algorithms. 

1.5.1 Performance of the laa GAME algorithm 

In this experiment, we fix A^ = 1000, M = 200 and k = 20, and generate a 
200 X 1000 Gaussian matrix 4>. Each experiment is repeated independently 50 times. 
We compare the performance of the foo GAME algorithm, which approximately 
solves the non-convex problem 

minimize ||4 >^$q: - fl 35 ) 

(/c,r) ^ ^ 

with state-of-the-art Dantzig Selector solvers [1^ [33] that solve linear optimization 


minimize 11 $' $a-$' /lie 

aGAf j (t) 


(1.36) 


The compressive measurements were generated in the presence of white Gaussian 
noise. The noise vector consists of M iid 131(0,(7^) elements, where a ranges from 
10“^-^ to 10“°-®. Figure 1.3 compares the data-domain f 2 -error (||a* — Q;|j 2 /||a:*II 2 ) 
of the GAME algorithm with the error of £i-magic algorithm and the Ho- 
motopy algorithm |46j which are state-of-the-art Dantzig Selector optimizers. As 
illustrated in Figure [TT^ as a increases to lO-"^, the GAME algorithm outperforms 
the £ 1 -magic and Homotopy algorithms. 
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Figure 1.3: Signal approximation experiments with ^i-magic, Homotopy, and 
GAME algorithms. The measurement noise standard deviation ranges from 


10 “ 


to 10 and the approximation error is measured as 


\ct — q : 2 


/• 


1.5.2 Performance of Clash Algorithm 


Noise resilience: We generate random realizations of the model f = for 

N = 1000, M = 305 and k = 115 where k is known a-priori and a* admits the sim¬ 
ple sparsity model. We construct a* as a A:-spare vector with iid N(0,1) elements 
with IIa* II 2 = 1- We repeat the same experiment independently for 50 Monte- 
Carlo iterations. In this experiment, we examine the signal recovery performance 


of Clash compared to the following state-of-the-art methods: i) Lasso (1.4) as a 
projected gradient method, ii) Basis Pursuit m using SPGLl implementation m 
and, in) Subspace Pursuit m- We test the recovery performance of the aforemen¬ 
tioned methods for various noise standard deviations - the empirical results are 
depicted in Figure [L^ We observe that the combination of hard thresholding with 
norm constraints significantly improves the signal recovery performance over both 
convex- and combinatorial-based approaches. 

Improved recovery using Clash: We generate random realizations of the 
model / = -|- n for A = 500, M = 160 and k = {57,62} for the noisy and the 

noiseless case respectively, where k is known a-priori. We construct a* as a A:-spare 
vector with iid 1^(0,!) elements with ||q :*||2 = 1- In the noisy case, we assume 
|ln ||2 = 0.05. We perform 500 independent Monte-Carlo iterations. We then sweep 
r and then examine the signal recovery performance of Clash compared to the 
same methods above. Note that, if t is large, norm constraints have no impact in 
recovery and Clash must admit identical performance to SP. 


Figure 1.5 illustrates that the combination of hard thresholding with norm 
constraints can improve the signal recovery performance significantly over convex- 
only and hard thresholding-only methods. Clash perfectly recovers the signal when 
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Figure 1.4: Signal approximation experiments with Clash, Lasso, and BP 
algorithms. The measurement noise standard deviation ranges from 10“® to 
and the approximation error is measured as ||q;* — q:|| 2 . 




X 


X 


Figure 1.5: Improved signal recovery using Clash. 


the regularization parameter is close to ||q:*||i. When r «C ||q:*||i or r ||cr*||i, 
the performance degrades. 


1.6 Conclusions 

We discussed two sparse recovery algorithms that explicitly leverage convex and 
non-convex £q priors jointly. While the prior is conventionally motivated as the 
“convexification” of the prior, we saw that this interpretation is incomplete: it 
actually is a convexification of the ^o'^onstrained set with a maximum scale. We 
also discovered that the interplay of these two—seemingly related—priors could lead 
to not only strong theoretical recovery guarantees from weaker assumptions than 
commonly used in sparse recovery, but also improved empirical performance over 
the existing solvers. To obtain our results, we reviewed some important topics from 
game theory, convex and combinatorial optimization literature. We believe that 
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understanding and exploiting the interplay of such convex and non-convex priors 
could lead to radically new, scalable regression approaches, which can leverage 
decades of work in diverse theoretical disciplines. 
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